Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Mekhala Kumar
August 17, 2022
The dataset used was organicpoultry. It contains information about the quantity of different poultry types for all months across the years 2004 to 2013. Currently the month and year data all fall under a single column. In order to make the data easy to interpret, first, the column with the data of the month and year need to be separated into two columns.
Following which, the data needs to be pivoted in such a manner that the years become columns and the types of poultry become rows. This format will make it easier to select a subgroup within the types of poultry and compare the changes across years.
[1] "...1" "Extra Large \nDozen"
[3] "Extra Large 1/2 Doz.\n1/2 Dozen" "Large \nDozen"
[5] "Large \n1/2 Doz." "...6"
[7] "Whole" "B/S Breast"
[9] "Bone-in Breast" "Whole Legs"
[11] "Thighs"
# A tibble: 10 × 10
...1 Extra…¹ Extra…² Large…³ Large…⁴ Whole B/S B…⁵ Bone-…⁶ Whole…⁷ Thighs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 March 290 188. 268. 178 238. 704. 390.5 204. 216.25
2 April 290 188. 268. 178 238. 704. 390.5 204. 216.25
3 May 290 188. 268. 178 238. 704. 390.5 204. 216.25
4 June 290 188. 268. 178 238. 704. 390.5 204. 216.25
5 July 290 188. 268. 178 238. 704. 390.5 204. 216.25
6 August 290 188. 268. 178 238. 704. 390.5 204. 216.25
7 Septemb… 290 188. 268. 178 238. 704. 390.5 204. 216.25
8 October 290 188. 268. 178 238. 704. 390.5 204. 216.25
9 November 290 188. 268. 178 238. 704. 390.5 204. 216.25
10 December 290 188. 268. 178 238. 704. 390.5 204. 216.25
# … with abbreviated variable names ¹`Extra Large \nDozen`,
# ²`Extra Large 1/2 Doz.\n1/2 Dozen`, ³`Large \nDozen`, ⁴`Large \n1/2 Doz.`,
# ⁵`B/S Breast`, ⁶`Bone-in Breast`, ⁷`Whole Legs`
# A tibble: 6 × 10
...1 Extra …¹ Extra…² Large…³ Large…⁴ Whole B/S B…⁵ Bone-…⁶ Whole…⁷ Thighs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 Jan 2004 230 132 230 126 198. 646. too few 194. too f…
2 February 230 134. 226. 128. 198. 642. too few 194. 203
3 March 230 137 225 131 209 642. too few 194. 203
4 April 234. 137 225 131 212 642. too few 194. 203
5 May 236 137 225 131 214. 642. too few 194. 203
6 June 241 137 231. 134. 216. 641 too few 202. 200.3…
# … with abbreviated variable names ¹`Extra Large \nDozen`,
# ²`Extra Large 1/2 Doz.\n1/2 Dozen`, ³`Large \nDozen`, ⁴`Large \n1/2 Doz.`,
# ⁵`B/S Breast`, ⁶`Bone-in Breast`, ⁷`Whole Legs`
tibble [120 × 10] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:120] "Jan 2004" "February" "March" "April" ...
$ Extra Large
Dozen : num [1:120] 230 230 230 234 236 ...
$ Extra Large 1/2 Doz.
1/2 Dozen: num [1:120] 132 134 137 137 137 ...
$ Large
Dozen : num [1:120] 230 226 225 225 225 ...
$ Large
1/2 Doz. : num [1:120] 126 128 131 131 131 ...
$ Whole : num [1:120] 198 198 209 212 214 ...
$ B/S Breast : num [1:120] 646 642 642 642 642 ...
$ Bone-in Breast : chr [1:120] "too few" "too few" "too few" "too few" ...
$ Whole Legs : num [1:120] 194 194 194 194 194 ...
$ Thighs : chr [1:120] "too few" "203" "203" "203" ...
eggpoul<-eggpoul%>%
mutate(`Bone-in Breast` = parse_number(na_if(`Bone-in Breast`, "too few")),
Thighs = parse_number(na_if(Thighs, "too few")))
eggpoul<-eggpoul %>% separate(1, c("Month", "Year"), extra = "drop", fill = "right")
vec<-rep(c(1,2,3,4,5,6,7,8,9,10),each=12)
eggpoul$Year[vec==1] <- 2004
eggpoul$Year[vec==2] <- 2005
eggpoul$Year[vec==3] <- 2006
eggpoul$Year[vec==4] <- 2007
eggpoul$Year[vec==5] <- 2008
eggpoul$Year[vec==6] <- 2009
eggpoul$Year[vec==7] <- 2010
eggpoul$Year[vec==8] <- 2011
eggpoul$Year[vec==9] <- 2012
eggpoul$Year[vec==10] <- 2013
dim(eggpoul)
[1] 120 11
The original dataset has 120 rows and 11 columns. 2 of the variables are being used to identify a case. Hence,after pivoting, we expect to have 1080 rows and 4 columns. It is anticipated that the data will be long (taller).
After pivoting, the data has become taller. Pivoting has also ensured that all the variables of poultry types have been kept in a single column and the values corresponding to them are easy to access.
---
title: "Challenge 3 "
author: "Mekhala Kumar"
desription: "Tidy Data: Pivoting"
date: "08/17/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- organicpoultry
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Cleaning data and the reasoning for using pivot
The dataset used was organicpoultry. It contains information about the quantity of different poultry types for all months across the years 2004 to 2013. Currently the month and year data all fall under a single column. In order to make the data easy to interpret, first, the column with the data of the month and year need to be separated into two columns.\
Following which, the data needs to be pivoted in such a manner that the years become columns and the types of poultry become rows. This format will make it easier to select a subgroup within the types of poultry and compare the changes across years.\
```{r}
library(readxl)
library(tidyverse)
eggpoul <- read_excel("_data/organiceggpoultry.xls",skip=4)
View(eggpoul)
colnames(eggpoul)
eggpoul=subset(eggpoul,select=-c(...6))
tail(eggpoul, 10)
head(eggpoul)
str(eggpoul)
eggpoul<-eggpoul%>%
mutate(`Bone-in Breast` = parse_number(na_if(`Bone-in Breast`, "too few")),
Thighs = parse_number(na_if(Thighs, "too few")))
eggpoul<-eggpoul %>% separate(1, c("Month", "Year"), extra = "drop", fill = "right")
vec<-rep(c(1,2,3,4,5,6,7,8,9,10),each=12)
eggpoul$Year[vec==1] <- 2004
eggpoul$Year[vec==2] <- 2005
eggpoul$Year[vec==3] <- 2006
eggpoul$Year[vec==4] <- 2007
eggpoul$Year[vec==5] <- 2008
eggpoul$Year[vec==6] <- 2009
eggpoul$Year[vec==7] <- 2010
eggpoul$Year[vec==8] <- 2011
eggpoul$Year[vec==9] <- 2012
eggpoul$Year[vec==10] <- 2013
dim(eggpoul)
```
### Challenge: Describe the final dimensions
The original dataset has 120 rows and 11 columns. 2 of the variables are being used to identify a case. Hence,after pivoting, we expect to have 1080 rows and 4 columns. It is anticipated that the data will be long (taller).
```{r}
#existing rows/cases
nrow(eggpoul)
#existing columns/cases
ncol(eggpoul)
#expected rows/cases
nrow(eggpoul) * (ncol(eggpoul)-2)
# expected columns
(11-9)+2
```
### Challenge: Pivot the Chosen Data
After pivoting, the data has become taller. Pivoting has also ensured that all the variables of poultry types have been kept in a single column and the values corresponding to them are easy to access.
```{r}
eggpoul<-pivot_longer(eggpoul, 3:11, names_to = "Type of Poultry", values_to = "Amount")
dim(eggpoul)
View(eggpoul)
```